Global Project Options

When you start a project is good practice to use here::i_am("<TypeProjectName>") to set up the top-level project directory. The base-R command to do that is setwd(), but this is fragile as it is operating system dependent as well as filesystem (absdolute and relative) dependent.

More info: https://here.r-lib.org

Importing Data

We will use meteorological data set from National Oceanic and Atmospheric Administration (NOAA). NOAA is an agency under the U.S. Dept. of Commerce that provides lots of data and information related to Climate. Go take a look at: https://www.noaa.gov.

Given we are in the San Francisco Bay Area, we might want to work on some local meteorological data. I’ve selected the Station FTPC1 - 9414290 - San Francisco, CA (37.806 N 122.466 W) and prepared a time-series containing measurement on: - Wind Direction (“WDIR”) - Wind Speed (“WSPD”) - Atmospheric Pressure (“PRES”) - Atmospheric Temperature (“ATMP”)

The time-series is in week3/data directory and the file name is meteo_SF_FTPC1_2011-2020.rds. A rds file is a native data formats in R. It is a serialized object that, compared to a csv occupy less space on drive.

Import the file in memory and assign it into the variable meteo_df:

#meteo_df <- read_rds(file = "week3/data/meteo_SF_FTPC1_2011-2020.rds")
meteo_df <- read_rds(file = here("week3", "data", "meteo_SF_FTPC1_2011-2020.rds"))

Note that we used the function read_rds with the file-name as the function argument. The file-name is the output of the function here(). Which are the argument in this case?

Once data are imported we can check some of the data-set characteristics:

Dataset inspection

dim()

Use dim() command to get information on meteo_df dimensions:

dim(meteo_df)
## [1] 870504      5

The output of the command gives you the number of rows and columns. Remember that you can execute ?dim in the Console to open the Help panel.

tail()

To see when the data-set ends we can use the tail() command:

tail(meteo_df)

The command works in a similar to head() but showing the last rows of the data.

str()

An useful command to look into the data structure:

str(meteo_df)
## tibble [870,504 × 5] (S3: tbl_df/tbl/data.frame)
##  $ TIME: chr [1:870504] "2011-01-01 00:00:00" "2011-01-01 00:06:00" "2011-01-01 00:12:00" "2011-01-01 00:18:00" ...
##  $ WDIR: num [1:870504] 3 3 1 8 8 12 11 19 8 13 ...
##  $ WSPD: num [1:870504] 4.3 4.3 4.6 4.4 4.4 3.9 3.2 3.4 3.1 3.5 ...
##  $ PRES: num [1:870504] 1018 1018 1018 1018 1018 ...
##  $ ATMP: num [1:870504] 8.2 8.2 8.2 8.1 8.1 8.1 8.2 8.3 8.2 8 ...

glimpse()

Another useful command to inspect the data-set:

glimpse(meteo_df) # this is a nice command that we will use a lot
## Rows: 870,504
## Columns: 5
## $ TIME <chr> "2011-01-01 00:00:00", "2011-01-01 00:06:00", "2011-01-01 00:12:0…
## $ WDIR <dbl> 3, 3, 1, 8, 8, 12, 11, 19, 8, 13, 7, 358, 358, 14, 7, 8, 12, 12, …
## $ WSPD <dbl> 4.3, 4.3, 4.6, 4.4, 4.4, 3.9, 3.2, 3.4, 3.1, 3.5, 3.9, 4.1, 4.1, …
## $ PRES <dbl> 1018.4, 1018.4, 1018.3, 1018.2, 1018.2, 1018.1, 1017.9, 1017.9, 1…
## $ ATMP <dbl> 8.2, 8.2, 8.2, 8.1, 8.1, 8.1, 8.2, 8.3, 8.2, 8.0, 8.0, 7.9, 7.9, …

summary()

To produce a table with a summary of each data variables we can use summary(). The function invokes particular methods which depend on the class of the first argument.

summary(meteo_df)
##      TIME                WDIR            WSPD            PRES       
##  Length:870504      Min.   :  0.0   Min.   :0.000   Min.   : 987.1  
##  Class :character   1st Qu.:197.0   1st Qu.:1.800   1st Qu.:1013.8  
##  Mode  :character   Median :238.0   Median :3.100   Median :1016.7  
##                     Mean   :239.7   Mean   :3.457   Mean   :1284.8  
##                     3rd Qu.:259.0   3rd Qu.:4.700   3rd Qu.:1020.5  
##                     Max.   :999.0   Max.   :9.900   Max.   :9999.0  
##       ATMP      
##  Min.   :  1.8  
##  1st Qu.: 11.4  
##  Median : 13.0  
##  Mean   : 52.6  
##  3rd Qu.: 14.9  
##  Max.   :999.0

Metadata

A data without metadata is almost useless… Metadata is data that provides information about other data but not the content of the data. Take a look at the wiki: https://en.wikipedia.org/wiki/Metadata.

Looking at the summary() output we can note that:

  • TIME is a character.. R is not recognizing it as a time object.
  • we do not know which Time Standard are user for TIME
  • there are some strange numbers… e.g. what is ATMP = 9999?

We need to know more about:

  • Time Standard
  • Variables’ units

To get these information we need to get back to the NOAA Station webpage and check data descriptions: https://www.ndbc.noaa.gov/measdes.shtml

From the link we can see that:

  • TIME is in UTC
  • WDIR is Wind direction in degrees clockwise from true N
  • WSPD Wind speed (m/s)
  • PRES Sea level pressure (hPa)
  • ATMP Air temperature (Celsius)
  • Strange 99. 999. 9999. numbers are NA or missing values

Create a Temperature dataset

In the following part of the class we will only use TIME and TEMPERATURE. We don’t need to carry around the other variables (for now…). Use the tidyverse sintax to subset the meteo_df and get a new data frame. Assign the new data frame to the variable temperature_df.

*Remind: we will use the %>% (known as the pipe) operator and the function select(). You can read the following code as: from meteo_df select variable TIME and ATMP and assign them to a new object called tempearature_df:

temperature_df <- meteo_df %>%
  select(TIME, ATMP)

head(temperature_df)

Time Standard

The Coordinated Universal Time or UTC is the primary time standard by which the world regulates clocks and time. The Reference Meridian

“World Time Zones Map” https://en.wikipedia.org/wiki/Coordinated_Universal_Time#/media/File:World_Time_Zones_Map.png

International Reference Meridian

Calculating time for navigation..

In 1714, the British government offered a longitude prize for a method of determining longitude at sea, with the awards ranging from ~£20,000 (~$2.6 million to ~$5.2 million in 2022 terms) depending on accuracy.

From 1730 to 1761 John Harrison worked on the creation of the first chronometer to measure the time differences between anywhere on the Earth and the Greenwich Observatory.

He solved the greatest proble of his time.. indeed.

lubridate to work with Time in R

So.. we should be ready to do some analysis. But first we need to tell R to consider TIME, still a vector of character strings, as a date/time object.

Date/time is very annoying for a computer.. how much is \(365 + 1\)?. It is not trivial.. is 366? 1?.. actually can be both if we are counting days of the year: leap year, occurring every 4 years (The Julian year is 365.25 days).

Does every day has 24 hr?

Dates and times are hard because they have to reconcile two physical phenomena (the rotation of the Earth and its orbit around the sun) with a whole raft of geopolitical phenomena including months, time zones, and Data Saving Time (DST).

The lubridate package, which makes it easier to work with dates and times in R. lubridate is not part of core tidyverse because you only need it when you’re working with dates/times. If we already imported the library good otherwise type library(lubridate) in the Console to load the package’s functions in memory.

With lubridate we can add to R two new class:

As you may have guessed.. we will use <dtty> objects!

Create date and dtty objs from string

We can use the ymd() family to convert a string to a date object:

dumb_data <- "2022-04-22"
typeof(dumb_data)
## [1] "character"
dumb_data <- lubridate::ymd(dumb_data)
print(dumb_data)
## [1] "2022-04-22"
typeof(dumb_data) # What we expect as output?
## [1] "double"
class(dumb_data) # It is a new class! Not a new type!
## [1] "Date"

Why is a double? We just said that it was a date… date/times as numeric offsets from the “Unix Epoch”, 1970-01-01… with offsets that can be seconds (for dtty) or days (for date).

as.numeric(ymd("2022-04-22"))
## [1] 19104
ymd("2022-04-22") - 19104
## [1] "1970-01-01"
as_date(19104)
## [1] "2022-04-22"
as_datetime(19104)
## [1] "1970-01-01 05:18:24 UTC"

What is the number of offset of now!? as.numeric(now())

as.numeric(ymd_hms("2022-04-22 15:00:00")) #as.numeric(now())
## [1] 1650639600
print(2 ^ .Machine$double.digits)
## [1] 9.007199e+15
as_datetime(2 ^ .Machine$double.digits) # when the world will finish (according to R) #options(digits = 22) #it is 53 default?
## [1] "285428751-11-12 07:36:32 UTC"
as_datetime(-1 * (2 ^ .Machine$double.digits)) # the genesis (according to R)
## [1] "-285424812-02-20 16:23:28 UTC"

Check https://en.wikipedia.org/wiki/Future_of_Earth.. at least for few years we should be good with this system. Later on some more critical problems may arise..

More examples on ymd():

Imagine that a real assh*** save a dtty object in a very uncommon way…

mdy("January 31st, 2017")
## [1] "2017-01-31"
dym("31, 2017, 1")
## [1] "2017-01-31"
dym("31, 2017, Jan")
## [1] "2017-01-31"
dym("31-2017;:-#ùì1")
## [1] "2017-01-31"
make_date(year = 2017, month = 01, day = 31)
## [1] "2017-01-31"
make_datetime(2017,01,31,12,00,00)
## [1] "2017-01-31 12:00:00 UTC"

Date-time components (not in class)

Some time is necessary to get data/time components:

  • year()
  • month()
  • yday()
  • mday()
  • wday()
  • hour()
  • minute()
  • second()
dumb_datetime <- ymd_hms("2022-04-22 15:34:56")
yday(dumb_datetime)
## [1] 112
wday(dumb_datetime)
## [1] 6
wday(dumb_datetime, label = TRUE)
## [1] Fri
## Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
wday(dumb_datetime, label = TRUE, abbr = FALSE)
## [1] Friday
## 7 Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < ... < Saturday

Rounding data (not in class)

We can use the following functions to “round” the data:

  • floor_date()
  • round_date()
  • ceiling_date()
dumb_datetime <- ymd_hms("2022-04-22 15:34:56")
floor_date(dumb_datetime, unit = "year")
## [1] "2022-01-01 UTC"
floor_date(dumb_datetime, unit = "week")
## [1] "2022-04-17 UTC"
wday(floor_date(dumb_datetime, unit = "week"))
## [1] 1
wday(floor_date(dumb_datetime, unit = "week"), label = TRUE)
## [1] Sun
## Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
rm(dumb_datetime)

Setting components (not in class)

Change part of a date-time object and update():

(datetime <- ymd_hms("2016-07-08 12:34:56"))
## [1] "2016-07-08 12:34:56 UTC"
year(datetime) <- 2020
datetime
## [1] "2020-07-08 12:34:56 UTC"
month(datetime) <- 01
datetime
## [1] "2020-01-08 12:34:56 UTC"
hour(datetime) <- hour(datetime) + 1
datetime
## [1] "2020-01-08 13:34:56 UTC"
update(datetime, year = 2020, month = 2, mday = 2, hour = 2)
## [1] "2020-02-02 02:34:56 UTC"

Durations and Periods (not in class)

How old is Fede?

today() - ymd(19890117)
## Time difference of 12148 days
as.duration(today() - ymd(19890117))
## [1] "1049587200s (~33.26 years)"

We can use different durations (calculated in seconds.. behavior can be confusing when DST):

  • dseconds()
  • dminutes()
  • dhours()
  • ddays()
  • dweeks()
  • dyears()

We can use periods: time spans but don’t have a fixed length in seconds, instead they work with “human” times, like days and months.

Time-series analysis

Now that we have introduced lubridate we are ready to work on the data-set! Remember that we have to deal with temperature NAs!

ymd_hms()

We know that the TIME is in UTC, let’s use lubridate to change the class of the TIME character string to dtty:

temperature_df <- temperature_df %>%
  mutate(TIME = lubridate::ymd_hms(TIME, tz = "UTC"))

na_if()

We also know that some ATMP values are NAs. Use the function na_if to mutate them to NA so they can be handled by R:

temperature_df <- temperature_df %>%
  mutate(ATMP = na_if(x = ATMP, y = 999)) 

summary(temperature_df)
##       TIME                          ATMP      
##  Min.   :2011-01-01 00:00:00   Min.   : 1.80  
##  1st Qu.:2013-06-30 11:04:30   1st Qu.:11.30  
##  Median :2015-12-29 13:57:00   Median :12.90  
##  Mean   :2015-12-31 00:24:33   Mean   :12.99  
##  3rd Qu.:2018-06-30 22:01:30   3rd Qu.:14.60  
##  Max.   :2020-12-31 23:54:00   Max.   :34.40  
##                                NA's   :34964

Plot to check if data are sound

Now that we have the correct class for TIME and that the summary table for ATMP is sound it is a good idea to start plotting some part of the data-set.

With such huge dataset some sanity check are mandatory before starting a real analysis…

We can plot the temperature of a single day.. plotting the entire dataframe will not tell much… for instance the 1st January 2020:

temperature_df %>%
  filter(TIME >= "2020-01-01",
         TIME < "2020-01-02") %>%
  ggplot(data = ., aes(x = TIME, y = ATMP)) +
  geom_line() + 
  theme_classic()

We note that:

  • there are some missing values (this is ok)
  • we can see “steps” in temperature measurements.. this is the precision of the thermometer (quite precise indeed..)
  • why at noon we have the lower TEMPERATURE??

with_tz() considering local time

Data are in UTC! So we want to change time-zone to the Pacific Time Zone (PT). We will use the with_tz() function:

temperature_df <- temperature_df %>%
  mutate(TIME_LOCAL = lubridate::with_tz(TIME, tzone = "America/Los_Angeles")) %>%
  glimpse()
## Rows: 870,504
## Columns: 3
## $ TIME       <dttm> 2011-01-01 00:00:00, 2011-01-01 00:06:00, 2011-01-01 00:12…
## $ ATMP       <dbl> 8.2, 8.2, 8.2, 8.1, 8.1, 8.1, 8.2, 8.3, 8.2, 8.0, 8.0, 7.9,…
## $ TIME_LOCAL <dttm> 2010-12-31 16:00:00, 2010-12-31 16:06:00, 2010-12-31 16:12…

Here we

time-zones

The list of time-zone:

To check the time-zone from your console run grep("America", OlsonNames(), value=TRUE) in the R Console.

Redo the plot

Check if now the TIME and TEMPERATURE are reflecting local time:

temperature_df %>%
  select(TIME_LOCAL, ATMP) %>%
  filter(TIME_LOCAL >= "2020-01-01",
         TIME_LOCAL < "2020-01-02") %>%
  ggplot(data = ., aes(x = TIME_LOCAL, y = ATMP)) +
  geom_line() + 
  theme_classic()

Oh! We are very happy now!

duplicated() rows check

You may not expect to have to check this.. but local time is affected by DST!

Why we are checking as.character()? Guess..

There are duplicated values at the same months… every year… the same hour! Indeed.. on local DST there are one missing hour and one duplicated hour every year.

temperature_df$TIME[duplicated(as.character(temperature_df$TIME))]
## POSIXct of length 0
temperature_df$TIME_LOCAL[duplicated((temperature_df$TIME_LOCAL))] 
## POSIXct of length 0
temperature_df$TIME_LOCAL[duplicated(as.character(temperature_df$TIME_LOCAL))] %>% knitr::kable() #use view()
x
2011-11-06 01:00:00
2011-11-06 01:06:00
2011-11-06 01:12:00
2011-11-06 01:18:00
2011-11-06 01:24:00
2011-11-06 01:30:00
2011-11-06 01:36:00
2011-11-06 01:42:00
2011-11-06 01:48:00
2011-11-06 01:54:00
2012-11-04 01:00:00
2012-11-04 01:06:00
2012-11-04 01:12:00
2012-11-04 01:18:00
2012-11-04 01:24:00
2012-11-04 01:30:00
2012-11-04 01:36:00
2012-11-04 01:42:00
2012-11-04 01:48:00
2012-11-04 01:54:00
2013-11-03 01:00:00
2013-11-03 01:06:00
2013-11-03 01:12:00
2013-11-03 01:18:00
2013-11-03 01:24:00
2013-11-03 01:30:00
2013-11-03 01:36:00
2013-11-03 01:42:00
2013-11-03 01:48:00
2013-11-03 01:54:00
2014-11-02 01:00:00
2014-11-02 01:06:00
2014-11-02 01:12:00
2014-11-02 01:18:00
2014-11-02 01:24:00
2014-11-02 01:30:00
2014-11-02 01:36:00
2014-11-02 01:42:00
2014-11-02 01:48:00
2014-11-02 01:54:00
2015-11-01 01:00:00
2015-11-01 01:06:00
2015-11-01 01:12:00
2015-11-01 01:18:00
2015-11-01 01:24:00
2015-11-01 01:30:00
2015-11-01 01:36:00
2015-11-01 01:42:00
2015-11-01 01:48:00
2015-11-01 01:54:00
2016-11-06 01:00:00
2016-11-06 01:06:00
2016-11-06 01:12:00
2016-11-06 01:18:00
2016-11-06 01:24:00
2016-11-06 01:30:00
2016-11-06 01:36:00
2016-11-06 01:42:00
2016-11-06 01:48:00
2016-11-06 01:54:00
2017-11-05 01:00:00
2017-11-05 01:06:00
2017-11-05 01:12:00
2017-11-05 01:18:00
2017-11-05 01:24:00
2017-11-05 01:30:00
2017-11-05 01:36:00
2017-11-05 01:42:00
2017-11-05 01:48:00
2017-11-05 01:54:00
2018-11-04 01:00:00
2018-11-04 01:06:00
2018-11-04 01:12:00
2018-11-04 01:18:00
2018-11-04 01:24:00
2018-11-04 01:30:00
2018-11-04 01:36:00
2018-11-04 01:42:00
2018-11-04 01:48:00
2018-11-04 01:54:00
2019-11-03 01:00:00
2019-11-03 01:06:00
2019-11-03 01:12:00
2019-11-03 01:18:00
2019-11-03 01:24:00
2019-11-03 01:30:00
2019-11-03 01:36:00
2019-11-03 01:42:00
2019-11-03 01:48:00
2019-11-03 01:54:00
2020-11-01 01:00:00
2020-11-01 01:06:00
2020-11-01 01:12:00
2020-11-01 01:18:00
2020-11-01 01:24:00
2020-11-01 01:30:00
2020-11-01 01:36:00
2020-11-01 01:42:00
2020-11-01 01:48:00
2020-11-01 01:54:00

Check hours around the DST on March:

  • Daylight saving time 2020 in California began at 2:00 AM on
  • Sunday, March 8
  • and ended at 2:00 AM on
  • Sunday, November 1
  • All times are in Pacific Time.
temperature_df %>%
  filter(TIME_LOCAL >= "2020-03-08 01:30:00",
         TIME_LOCAL < "2020-03-08 04:00:00") %>%
  #view()
  knitr::kable()
TIME ATMP TIME_LOCAL
2020-03-08 09:30:00 7.9 2020-03-08 01:30:00
2020-03-08 09:36:00 7.8 2020-03-08 01:36:00
2020-03-08 09:42:00 7.8 2020-03-08 01:42:00
2020-03-08 09:48:00 NA 2020-03-08 01:48:00
2020-03-08 09:54:00 7.5 2020-03-08 01:54:00
2020-03-08 10:00:00 7.6 2020-03-08 03:00:00
2020-03-08 10:06:00 7.3 2020-03-08 03:06:00
2020-03-08 10:12:00 7.4 2020-03-08 03:12:00
2020-03-08 10:18:00 NA 2020-03-08 03:18:00
2020-03-08 10:24:00 7.3 2020-03-08 03:24:00
2020-03-08 10:30:00 NA 2020-03-08 03:30:00
2020-03-08 10:36:00 7.1 2020-03-08 03:36:00
2020-03-08 10:42:00 7.3 2020-03-08 03:42:00
2020-03-08 10:48:00 7.4 2020-03-08 03:48:00
2020-03-08 10:54:00 7.3 2020-03-08 03:54:00

You can see that there is not 2AM!

Check hours around the DST on November:

temperature_df %>%
  filter(TIME_LOCAL >= "2020-11-01 00:30:00",
         TIME_LOCAL < "2020-11-01 04:00:00") %>%
  #view()
  knitr::kable()
TIME ATMP TIME_LOCAL
2020-11-01 07:30:00 12.4 2020-11-01 00:30:00
2020-11-01 07:36:00 12.3 2020-11-01 00:36:00
2020-11-01 07:42:00 12.5 2020-11-01 00:42:00
2020-11-01 07:48:00 12.6 2020-11-01 00:48:00
2020-11-01 07:54:00 12.8 2020-11-01 00:54:00
2020-11-01 08:00:00 12.9 2020-11-01 01:00:00
2020-11-01 08:06:00 12.8 2020-11-01 01:06:00
2020-11-01 08:12:00 12.8 2020-11-01 01:12:00
2020-11-01 08:18:00 13.0 2020-11-01 01:18:00
2020-11-01 08:24:00 13.2 2020-11-01 01:24:00
2020-11-01 08:30:00 13.3 2020-11-01 01:30:00
2020-11-01 08:36:00 13.2 2020-11-01 01:36:00
2020-11-01 08:42:00 13.3 2020-11-01 01:42:00
2020-11-01 08:48:00 13.4 2020-11-01 01:48:00
2020-11-01 08:54:00 13.1 2020-11-01 01:54:00
2020-11-01 09:00:00 12.8 2020-11-01 01:00:00
2020-11-01 09:06:00 12.9 2020-11-01 01:06:00
2020-11-01 09:12:00 13.4 2020-11-01 01:12:00
2020-11-01 09:18:00 NA 2020-11-01 01:18:00
2020-11-01 09:24:00 13.7 2020-11-01 01:24:00
2020-11-01 09:30:00 13.5 2020-11-01 01:30:00
2020-11-01 09:36:00 13.6 2020-11-01 01:36:00
2020-11-01 09:42:00 14.0 2020-11-01 01:42:00
2020-11-01 09:48:00 13.5 2020-11-01 01:48:00
2020-11-01 09:54:00 13.4 2020-11-01 01:54:00
2020-11-01 10:00:00 13.9 2020-11-01 02:00:00
2020-11-01 10:06:00 14.1 2020-11-01 02:06:00
2020-11-01 10:12:00 14.0 2020-11-01 02:12:00
2020-11-01 10:18:00 13.6 2020-11-01 02:18:00
2020-11-01 10:24:00 13.8 2020-11-01 02:24:00
2020-11-01 10:30:00 13.7 2020-11-01 02:30:00
2020-11-01 10:36:00 13.9 2020-11-01 02:36:00
2020-11-01 10:42:00 13.6 2020-11-01 02:42:00
2020-11-01 10:48:00 13.3 2020-11-01 02:48:00
2020-11-01 10:54:00 13.3 2020-11-01 02:54:00
2020-11-01 11:00:00 NA 2020-11-01 03:00:00

There are two 1AM in November.. a mess..

Local Solar Time

Depending on the analysis you’re carrying you might want different time representation..

We might want to use the Solar Local Time. This make more sense when making some climatic analysis.. not when communicating with the public!

If we check again the time-zone plot: https://en.wikipedia.org/wiki/Coordinated_Universal_Time#/media/File:World_Time_Zones_Map.png

we see that San Francisco belongs to UTC-8. Use

temperature_df <- temperature_df %>% 
  mutate(TIME_LOCAL_SOLAR = TIME - lubridate::hours(8))

In this case we make an explicit call to the function hours() from lubridate using ::. This is a good way to call a function avoiding implicit information.

Check again

temperature_df %>%
  filter(TIME_LOCAL >= "2020-03-08 01:30:00",
         TIME_LOCAL < "2020-03-08 04:00:00") %>%
  #view()
  knitr::kable()
TIME ATMP TIME_LOCAL TIME_LOCAL_SOLAR
2020-03-08 09:30:00 7.9 2020-03-08 01:30:00 2020-03-08 01:30:00
2020-03-08 09:36:00 7.8 2020-03-08 01:36:00 2020-03-08 01:36:00
2020-03-08 09:42:00 7.8 2020-03-08 01:42:00 2020-03-08 01:42:00
2020-03-08 09:48:00 NA 2020-03-08 01:48:00 2020-03-08 01:48:00
2020-03-08 09:54:00 7.5 2020-03-08 01:54:00 2020-03-08 01:54:00
2020-03-08 10:00:00 7.6 2020-03-08 03:00:00 2020-03-08 02:00:00
2020-03-08 10:06:00 7.3 2020-03-08 03:06:00 2020-03-08 02:06:00
2020-03-08 10:12:00 7.4 2020-03-08 03:12:00 2020-03-08 02:12:00
2020-03-08 10:18:00 NA 2020-03-08 03:18:00 2020-03-08 02:18:00
2020-03-08 10:24:00 7.3 2020-03-08 03:24:00 2020-03-08 02:24:00
2020-03-08 10:30:00 NA 2020-03-08 03:30:00 2020-03-08 02:30:00
2020-03-08 10:36:00 7.1 2020-03-08 03:36:00 2020-03-08 02:36:00
2020-03-08 10:42:00 7.3 2020-03-08 03:42:00 2020-03-08 02:42:00
2020-03-08 10:48:00 7.4 2020-03-08 03:48:00 2020-03-08 02:48:00
2020-03-08 10:54:00 7.3 2020-03-08 03:54:00 2020-03-08 02:54:00
temperature_df %>%
  filter(TIME_LOCAL >= "2020-11-01 00:00:00",
         TIME_LOCAL < "2020-11-01 04:00:00") %>%
  #view()
  knitr::kable()
TIME ATMP TIME_LOCAL TIME_LOCAL_SOLAR
2020-11-01 07:00:00 12.5 2020-11-01 00:00:00 2020-10-31 23:00:00
2020-11-01 07:06:00 12.5 2020-11-01 00:06:00 2020-10-31 23:06:00
2020-11-01 07:12:00 12.4 2020-11-01 00:12:00 2020-10-31 23:12:00
2020-11-01 07:18:00 NA 2020-11-01 00:18:00 2020-10-31 23:18:00
2020-11-01 07:24:00 12.4 2020-11-01 00:24:00 2020-10-31 23:24:00
2020-11-01 07:30:00 12.4 2020-11-01 00:30:00 2020-10-31 23:30:00
2020-11-01 07:36:00 12.3 2020-11-01 00:36:00 2020-10-31 23:36:00
2020-11-01 07:42:00 12.5 2020-11-01 00:42:00 2020-10-31 23:42:00
2020-11-01 07:48:00 12.6 2020-11-01 00:48:00 2020-10-31 23:48:00
2020-11-01 07:54:00 12.8 2020-11-01 00:54:00 2020-10-31 23:54:00
2020-11-01 08:00:00 12.9 2020-11-01 01:00:00 2020-11-01 00:00:00
2020-11-01 08:06:00 12.8 2020-11-01 01:06:00 2020-11-01 00:06:00
2020-11-01 08:12:00 12.8 2020-11-01 01:12:00 2020-11-01 00:12:00
2020-11-01 08:18:00 13.0 2020-11-01 01:18:00 2020-11-01 00:18:00
2020-11-01 08:24:00 13.2 2020-11-01 01:24:00 2020-11-01 00:24:00
2020-11-01 08:30:00 13.3 2020-11-01 01:30:00 2020-11-01 00:30:00
2020-11-01 08:36:00 13.2 2020-11-01 01:36:00 2020-11-01 00:36:00
2020-11-01 08:42:00 13.3 2020-11-01 01:42:00 2020-11-01 00:42:00
2020-11-01 08:48:00 13.4 2020-11-01 01:48:00 2020-11-01 00:48:00
2020-11-01 08:54:00 13.1 2020-11-01 01:54:00 2020-11-01 00:54:00
2020-11-01 09:00:00 12.8 2020-11-01 01:00:00 2020-11-01 01:00:00
2020-11-01 09:06:00 12.9 2020-11-01 01:06:00 2020-11-01 01:06:00
2020-11-01 09:12:00 13.4 2020-11-01 01:12:00 2020-11-01 01:12:00
2020-11-01 09:18:00 NA 2020-11-01 01:18:00 2020-11-01 01:18:00
2020-11-01 09:24:00 13.7 2020-11-01 01:24:00 2020-11-01 01:24:00
2020-11-01 09:30:00 13.5 2020-11-01 01:30:00 2020-11-01 01:30:00
2020-11-01 09:36:00 13.6 2020-11-01 01:36:00 2020-11-01 01:36:00
2020-11-01 09:42:00 14.0 2020-11-01 01:42:00 2020-11-01 01:42:00
2020-11-01 09:48:00 13.5 2020-11-01 01:48:00 2020-11-01 01:48:00
2020-11-01 09:54:00 13.4 2020-11-01 01:54:00 2020-11-01 01:54:00
2020-11-01 10:00:00 13.9 2020-11-01 02:00:00 2020-11-01 02:00:00
2020-11-01 10:06:00 14.1 2020-11-01 02:06:00 2020-11-01 02:06:00
2020-11-01 10:12:00 14.0 2020-11-01 02:12:00 2020-11-01 02:12:00
2020-11-01 10:18:00 13.6 2020-11-01 02:18:00 2020-11-01 02:18:00
2020-11-01 10:24:00 13.8 2020-11-01 02:24:00 2020-11-01 02:24:00
2020-11-01 10:30:00 13.7 2020-11-01 02:30:00 2020-11-01 02:30:00
2020-11-01 10:36:00 13.9 2020-11-01 02:36:00 2020-11-01 02:36:00
2020-11-01 10:42:00 13.6 2020-11-01 02:42:00 2020-11-01 02:42:00
2020-11-01 10:48:00 13.3 2020-11-01 02:48:00 2020-11-01 02:48:00
2020-11-01 10:54:00 13.3 2020-11-01 02:54:00 2020-11-01 02:54:00
2020-11-01 11:00:00 NA 2020-11-01 03:00:00 2020-11-01 03:00:00

Notes on UTC, DST, Solar Time

If we are doing a multi-year climatic analysis we are missing ~8 hr.. in 10 years is not a huge issue.. but take this into consideration.

summary(temperature_df)
##       TIME                          ATMP         TIME_LOCAL                 
##  Min.   :2011-01-01 00:00:00   Min.   : 1.80   Min.   :2010-12-31 16:00:00  
##  1st Qu.:2013-06-30 11:04:30   1st Qu.:11.30   1st Qu.:2013-06-30 04:04:30  
##  Median :2015-12-29 13:57:00   Median :12.90   Median :2015-12-29 05:57:00  
##  Mean   :2015-12-31 00:24:33   Mean   :12.99   Mean   :2015-12-30 16:24:33  
##  3rd Qu.:2018-06-30 22:01:30   3rd Qu.:14.60   3rd Qu.:2018-06-30 15:01:30  
##  Max.   :2020-12-31 23:54:00   Max.   :34.40   Max.   :2020-12-31 15:54:00  
##                                NA's   :34964                                
##  TIME_LOCAL_SOLAR             
##  Min.   :2010-12-31 16:00:00  
##  1st Qu.:2013-06-30 03:04:30  
##  Median :2015-12-29 05:57:00  
##  Mean   :2015-12-30 16:24:33  
##  3rd Qu.:2018-06-30 14:01:30  
##  Max.   :2020-12-31 15:54:00  
## 

There are no missing or duplicated hours.. as we want.

temperature_df$TIME_LOCAL_SOLAR[duplicated(as.character(temperature_df$TIME_LOCAL_SOLAR))]
## POSIXct of length 0

group() and summarise()

These functions are so powerful you can even imagine..

Part 1 YEAR

Let’s prepare our final data-set and save it in temperature_df_solar.

temperature_df_solar <- temperature_df %>% 
  select(TIME_LOCAL_SOLAR, ATMP)

We will use a working data-set tmp_df and we will create a YEAR variable that we are going to use for grouping and doing some statistical analysis:

tmp_df <- temperature_df_solar %>%
  mutate(YEAR = lubridate::year(TIME_LOCAL_SOLAR))

NOTE FOR FEDE: this part have to be explained slowly.. introduce group() and summarise() start without filter and only TEMP averaging and n(). Then filter() and add standard deviation etc.

tmp_df %>%
  # filter(TIME_LOCAL_SOLAR >= lubridate::ymd("2011-01-01")) %>%
  group_by(YEAR) %>%
  #summarise(TIME = mean(TIME_LOCAL_SOLAR, na.rm=TRUE), # we don't really need this.. 
  # option 1, we remove it.. option 2 below
  summarise(
            AVG_TEMP = mean(ATMP, na.rm=TRUE),
            #SD_TEMP = sd(ATMP, na.rm=TRUE),
            #SD_TEMP_PERC = SD_TEMP/AVG_TEMP*100, # this is cool
            #SD_TEMP_PERC2 = sd(ATMP, na.rm=TRUE)/mean(ATMP, na.rm=TRUE)*100, 
            nr = n()
            ) %>%
  glimpse()
## Rows: 11
## Columns: 3
## $ YEAR     <dbl> 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2…
## $ AVG_TEMP <dbl> 8.408861, 12.065400, 12.146867, 12.326561, 13.934820, 13.8483…
## $ nr       <int> 79, 87239, 87382, 86889, 87265, 87056, 86751, 87212, 86505, 8…

Plotting mean(ATMP) Part1

The best way to look at the data is to make some plots. We will introduce

  • %>% into ggplot()
  • geom_line() to plot lines
  • geom_point() to plot points
  • observe that ggplot() works with layers
  • scale_x_continuous()
  • change plot elements colors
tmp_df %>%
  filter(TIME_LOCAL_SOLAR >= lubridate::ymd("2011-01-01")) %>%
  group_by(YEAR) %>%
  summarise(
    AVG_TEMP = mean(ATMP, na.rm=TRUE),
    SD_TEMP = sd(ATMP, na.rm=TRUE),
    SD_TEMP_PERC = SD_TEMP/AVG_TEMP*100,
    nr = n()
  ) %>%
  ggplot(data = ., aes(x = YEAR, y = AVG_TEMP)) +
  geom_line(color="#006494") + 
  geom_point(size=3, color="#003554") +
  # geom_point(size=3, color="red") + # red covering the blue
  #geom_line() + #this cover the points! better before
  scale_x_continuous(breaks = seq(2011,2020,1), limits = c(2011,2020)) +
  xlab(element_blank()) + #https://ggplot2.tidyverse.org/reference/theme.html
  ylab("Atmospheric Temperature (°C)") +
  theme_classic() +
  theme(panel.grid.major = element_line(colour = "#B3B0AE"))

Comment: ggplot position scales for date/time data is very nice to format the default appearance of the x-axes label: https://ggplot2.tidyverse.org/reference/scale_date.html.

Plotting mean(ATMP) Part2

Adding Standard Deviation to the plot

tmp_df %>%
  filter(TIME_LOCAL_SOLAR >= lubridate::ymd("2011-01-01")) %>%
  group_by(YEAR) %>%
  summarise(
    AVG_TEMP = mean(ATMP, na.rm=TRUE),
    SD_TEMP = sd(ATMP, na.rm=TRUE),
    SD_TEMP_PERC = SD_TEMP/AVG_TEMP*100,
    nr = n()
  ) %>%
  ggplot(data = .) +
  geom_line(aes(x = YEAR, y = (AVG_TEMP + SD_TEMP )), color = "#006494") + # THIS THE SD
  geom_line(aes(x = YEAR, y = (AVG_TEMP - SD_TEMP )), color = "#006494") + # THIS THE SD
  geom_line(aes(x = YEAR, y = AVG_TEMP), color="#006494") +           # THIS FOR THE AVERAGE 
  geom_point(aes(x = YEAR, y = AVG_TEMP), size=3, color="#003554") +  # THIS FOR THE AVERAGE 
  #geom_line() + #this cover the points! better before
  scale_x_continuous(breaks = seq(2011,2020,1), limits = c(2011,2020)) +
  xlab(element_blank()) + #https://ggplot2.tidyverse.org/reference/theme.html
  ylab("Atmospheric Temperature (°C)") +
  theme_classic() +
  theme(panel.grid.major = element_line(colour = "#B3B0AE"))

Plotting mean(ATMP) Part3

Adding geom_segment() as an example of additional geometry. The list of ggplot geometries: https://ggplot2.tidyverse.org/reference/

tmp_df %>%
  filter(TIME_LOCAL_SOLAR >= lubridate::ymd("2011-01-01")) %>%
  group_by(YEAR) %>%
  summarise(
    AVG_TEMP = mean(ATMP, na.rm=TRUE),
    SD_TEMP = sd(ATMP, na.rm=TRUE),
    SD_TEMP_PERC = SD_TEMP/AVG_TEMP*100,
    T_MAX = max(ATMP, na.rm = TRUE),
    T_MIN = min(ATMP, na.rm = TRUE),
    nr = n()
  ) %>%
  ggplot(data = .) +
  geom_point(aes(x = YEAR, y = T_MAX), color = "#051923", bg = "#051923", pch = 24) + #ADD 1 
  geom_point(aes(x = YEAR, y = T_MIN), color = "#051923", bg = "#051923", pch = 25) + #ADD 1 
  geom_segment(aes(x = YEAR, xend = YEAR, y = T_MIN, yend = T_MAX), color = "#00A6FB") +
  geom_line(aes(x = YEAR, y = (AVG_TEMP + SD_TEMP )), color = "#006494") +
  geom_line(aes(x = YEAR, y = (AVG_TEMP - SD_TEMP )), color = "#006494") +
  geom_line(aes(x = YEAR, y = AVG_TEMP), color="#006494") +
  geom_point(aes(x = YEAR, y = AVG_TEMP), size=3, color="#003554") +
  #geom_line() + #this cover the points! better before
  scale_x_continuous(breaks = seq(2011,2020,1), limits = c(2011,2020)) +
  xlab(element_blank()) + #https://ggplot2.tidyverse.org/reference/theme.html
  ylab("Atmospheric Temperature (°C)") +
  theme_classic() +
  theme(panel.grid.major = element_line(colour = "#CCCCCC"))

A boxplot: geom_boxplot()

tmp_df %>%
  filter(TIME_LOCAL_SOLAR >= lubridate::ymd("2011-01-01")) %>%
  ggplot(.) +
  geom_boxplot(aes(x = YEAR, y = ATMP, group = YEAR)) +
  scale_x_continuous(breaks = seq(2011,2020,1), limits = c(2010.5,2020.5)) +
  xlab(element_blank()) + #https://ggplot2.tidyverse.org/reference/theme.html
  ylab("Atmospheric Temperature (°C)") +
  theme_classic() +
  theme(panel.grid.major = element_line(colour = "#CCCCCC"))
## Warning: Removed 34964 rows containing non-finite values (stat_boxplot).

A violin plot: geom_violin()

tmp_df %>%
  filter(TIME_LOCAL_SOLAR >= lubridate::ymd("2011-01-01")) %>%
  ggplot(.) +
  geom_violin(aes(x = YEAR, y = ATMP, group = YEAR)) +
  scale_x_continuous(breaks = seq(2011,2020,1), limits = c(2010.5,2020.5)) +
  xlab(element_blank()) + #https://ggplot2.tidyverse.org/reference/theme.html
  ylab("Atmospheric Temperature (°C)") +
  theme_classic() +
  theme(panel.grid.major = element_line(colour = "#CCCCCC"))
## Warning: Removed 34964 rows containing non-finite values (stat_ydensity).

### Take home message

Use geometry functions to produce better (nicer, meaningful) plots.

Part 2 MONTH

We have seen how powerful are group(). Let’s consider MONTH:

NOTA PER FEDE: start only with MONTH and then move to label=TRUE

tmp_df <- temperature_df_solar %>%
  filter(TIME_LOCAL_SOLAR > lubridate::ymd("2011-01-01")) %>%
  mutate(YEAR = lubridate::year(TIME_LOCAL_SOLAR),
         MONTH = lubridate::month(TIME_LOCAL_SOLAR), 
         MONTH_NAME = lubridate::month(TIME_LOCAL_SOLAR, label = TRUE, abbr = FALSE),
         MONTH_ABBR = lubridate::month(TIME_LOCAL_SOLAR, label = TRUE),
         ) %>%
  glimpse()
## Rows: 870,424
## Columns: 6
## $ TIME_LOCAL_SOLAR <dttm> 2011-01-01 00:06:00, 2011-01-01 00:12:00, 2011-01-01…
## $ ATMP             <dbl> 9.2, 9.1, 9.0, 8.9, 8.8, 8.8, 8.8, 8.7, 8.8, 8.8, 8.7…
## $ YEAR             <dbl> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011,…
## $ MONTH            <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ MONTH_NAME       <ord> January, January, January, January, January, January,…
## $ MONTH_ABBR       <ord> Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan…

monthly boxplot() 1

Here we will want to see how the Temperature distribution of a month is changing (or not) over the years.

tmp_df %>%
  filter(MONTH == 1) %>% # change from 1 to 12
  ggplot(.) +
  geom_boxplot(aes(x = YEAR, y = ATMP, group = YEAR) ) +
  scale_x_continuous(breaks = seq(2011,2020,1), limits = c(2010.5,2020.5)) +
  scale_y_continuous(limits = c(0,35)) +
  xlab(element_blank()) + #https://ggplot2.tidyverse.org/reference/theme.html
  ylab("Atmospheric Temperature (°C)") +
  theme_classic() +
  theme(panel.grid.major = element_line(colour = "#CCCCCC"))
## Warning: Removed 1354 rows containing non-finite values (stat_boxplot).

monthly boxplot() 2

Climatic analysis over 10 yrs. Each boxplot contains data of all a month, for example all January months.

Note: using levels() for the labels() is a good option!

tmp_df %>%
  ggplot(.) +
  geom_boxplot(aes(x = MONTH, y = ATMP, group = MONTH) ) +
  #scale_x_continuous(breaks = seq(1,12,1), limits = c(0.5,12.5)) +
  scale_x_continuous(breaks = seq(1,12,1), limits = c(0.5,12.5), labels = c(levels(tmp_df$MONTH_ABBR))) + #https://ggplot2.tidyverse.org/reference/scale_continuous.html
  scale_y_continuous(limits = c(0,35)) +
  xlab(element_blank()) + 
  ylab("Atmospheric Temperature (°C)") +
  theme_classic() +
  theme(panel.grid.major = element_line(colour = "#CCCCCC"))
## Warning: Removed 34964 rows containing non-finite values (stat_boxplot).

tmp_df %>% 
  group_by(MONTH) %>%
  summarise(YEAR = YEAR,
            MONTH = MONTH,
            T_AVG = median(ATMP, na.rm = TRUE),
            n = n()
            ) %>%
  glimpse() %>%
  ggplot(.) +
  geom_point(aes(x = MONTH, y = T_AVG, group = MONTH)) +
  scale_x_continuous(breaks = seq(1,12,1), limits = c(0.5,12.5), labels = c(levels(tmp_df$MONTH_ABBR))) +
  scale_y_continuous(limits = c(0,35)) +
  xlab(element_blank()) + 
  ylab("Atmospheric Temperature (°C)") +
  theme_classic() +
  theme(panel.grid.major = element_line(colour = "#CCCCCC"))
## `summarise()` has grouped output by 'MONTH'. You can override using the
## `.groups` argument.
## Rows: 870,424
## Columns: 4
## Groups: MONTH [12]
## $ MONTH <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ YEAR  <dbl> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011…
## $ T_AVG <dbl> 11.1, 11.1, 11.1, 11.1, 11.1, 11.1, 11.1, 11.1, 11.1, 11.1, 11.1…
## $ n     <int> 73664, 73664, 73664, 73664, 73664, 73664, 73664, 73664, 73664, 7…

geom_line() plots

tmp_df %>%
  group_by(YEAR, MONTH) %>%
  summarise(#TIME = lubridate::floor_date(x = mean(TIME_LOCAL_SOLAR, na.rm=TRUE), unit = "day"), #Not in class
            TEMPERATURE = mean(ATMP, na.rm = T)
  ) %>% 
  #ungroup() %>% # Not in class
  #filter(YEAR == 2011) %>% # EXAMPLE IN CLASS
  glimpse() %>%
  ggplot(data = ., aes(x = MONTH, y = TEMPERATURE, group = as.factor(YEAR), color=as.factor(YEAR))) +
  geom_line() +
  scale_x_continuous(breaks = seq(1,12,1), limits = c(0.5,12.5), labels = c(levels(tmp_df$MONTH_ABBR))) +
  scale_y_continuous(limits = c(0,35)) +
  xlab(element_blank()) + 
  ylab("Atmospheric Temperature (°C)") +
  theme_classic() +
  theme(panel.grid.major = element_line(colour = "#CCCCCC"))
## `summarise()` has grouped output by 'YEAR'. You can override using the `.groups`
## argument.
## Rows: 120
## Columns: 3
## Groups: YEAR [10]
## $ YEAR        <dbl> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011…
## $ MONTH       <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7…
## $ TEMPERATURE <dbl> 9.779760, 9.923522, 11.237834, 11.406021, 11.705405, 12.74…

Part 3 SEASON & case_when()

Let’s create a new grouping.. just to play! And we can learn a new function!

tmp_df <- tmp_df %>%
  mutate(SEASON = case_when( MONTH %in% c(1:2,12) ~ "Winter",
                                  MONTH %in% c(3:5) ~ "Spring",
                                  MONTH %in% c(6:8) ~ "Summer",
                                  MONTH %in% c(9:11) ~ "Fall"
                                  ))

… also.. do you remember the %in% operator!?

tmp_df %>%
  filter(YEAR == 2018) %>%
  ggplot() +
  geom_boxplot(aes(x = SEASON, y = ATMP, group = SEASON))
## Warning: Removed 3932 rows containing non-finite values (stat_boxplot).

There is something wrong here…. we need factors!

factor()

Remember that factors are data structures in R that store categorical data. In datasets, there are often fields that take only a few predefined values. For example: gender, availability, country, marital status, etc.

tmp_df <- tmp_df %>%
  mutate(SEASON = factor(SEASON, levels = c("Winter", 
                                            "Spring",
                                            "Summer",
                                            "Fall")
                         )
         )

And plot again:

tmp_df %>%
  filter(YEAR == 2018) %>%
  ggplot() +
  geom_boxplot(aes(x = SEASON, y = ATMP, group = SEASON))
## Warning: Removed 3932 rows containing non-finite values (stat_boxplot).

Wait a minute… Fall temperature is higher than Summer?

tmp_df %>%
  filter(YEAR == 2018) %>%
  group_by(MONTH) %>%
  summarise(T_AVG = mean(ATMP, na.rm = TRUE)
            ) %>%
  #view()
  knitr::kable()
MONTH T_AVG
1 11.47157
2 11.33772
3 11.19353
4 12.14815
5 12.47028
6 13.24482
7 14.02146
8 13.89349
9 14.02653
10 14.91397
11 13.76763
12 11.77784

The Indian Summer is real! https://en.wikipedia.org/wiki/Indian_summer

Part 4 DAY

We are reaching the top! Create a DAY variable:

tmp_df <- temperature_df_solar %>%
  filter(TIME_LOCAL_SOLAR > lubridate::ymd("2011-01-01")) %>%
  mutate(YEAR = lubridate::year(TIME_LOCAL_SOLAR),
         MONTH = lubridate::month(TIME_LOCAL_SOLAR), 
         MONTH_NAME = lubridate::month(TIME_LOCAL_SOLAR, label = TRUE, abbr = FALSE),
         MONTH_ABBR = lubridate::month(TIME_LOCAL_SOLAR, label = TRUE, abbr = TRUE),
         DAY = lubridate::day(TIME_LOCAL_SOLAR),
         DAY_NAME = lubridate::wday(TIME_LOCAL_SOLAR, label = TRUE, abbr = FALSE),
         DAY_ABBR = lubridate::wday(TIME_LOCAL_SOLAR, label = TRUE, abbr = TRUE),
         WEEKDAY = lubridate::wday(TIME_LOCAL_SOLAR),
         SEASON = case_when( MONTH %in% c(1:2,12) ~ "Winter",
                             MONTH %in% c(3:5) ~ "Spring",
                             MONTH %in% c(6:8) ~ "Summer",
                             MONTH %in% c(9:11) ~ "Fall"
                             )
         ) %>%
  mutate(SEASON = factor(SEASON, levels = c("Winter", 
                                            "Spring",
                                            "Summer",
                                            "Fall")
                         )
         ) %>%
  glimpse()
## Rows: 870,424
## Columns: 11
## $ TIME_LOCAL_SOLAR <dttm> 2011-01-01 00:06:00, 2011-01-01 00:12:00, 2011-01-01…
## $ ATMP             <dbl> 9.2, 9.1, 9.0, 8.9, 8.8, 8.8, 8.8, 8.7, 8.8, 8.8, 8.7…
## $ YEAR             <dbl> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011,…
## $ MONTH            <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ MONTH_NAME       <ord> January, January, January, January, January, January,…
## $ MONTH_ABBR       <ord> Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan…
## $ DAY              <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ DAY_NAME         <ord> Saturday, Saturday, Saturday, Saturday, Saturday, Sat…
## $ DAY_ABBR         <ord> Sat, Sat, Sat, Sat, Sat, Sat, Sat, Sat, Sat, Sat, Sat…
## $ WEEKDAY          <dbl> 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,…
## $ SEASON           <fct> Winter, Winter, Winter, Winter, Winter, Winter, Winte…

Let’s compare January’s days over the 10 years! (.. if it makes sense..)

tmp_df %>%
  filter(MONTH == 1) %>%
  group_by(YEAR,DAY) %>%
  summarise(TIME = lubridate::floor_date(x = mean(TIME_LOCAL_SOLAR, na.rm=TRUE), unit = "day"),
            TEMPERATURE = mean(ATMP, na.rm = T)
  ) %>% 
  ungroup() %>%
  #filter(YEAR == 2011) %>%
  ggplot(data = ., aes(x = DAY, y = TEMPERATURE, group = as.factor(YEAR), color=as.factor(YEAR))) +
  geom_line() +
  scale_x_continuous(breaks = seq(1,31,1), limits = c(0.5,31.5)) +
  scale_y_continuous(limits = c(0,35)) +
  xlab(element_blank()) + 
  ylab("Atmospheric Temperature (°C)") +
  theme_classic() +
  theme(panel.grid.major = element_line(colour = "#CCCCCC"))
## `summarise()` has grouped output by 'YEAR'. You can override using the `.groups`
## argument.

Part 5 HOUR

We are finally got the queen! Make an hourly average :D .. will take some time

tmp_df <- temperature_df_solar %>%
  filter(TIME_LOCAL_SOLAR > lubridate::ymd("2011-01-01")) %>%
  mutate(YEAR = lubridate::year(TIME_LOCAL_SOLAR),
         MONTH = lubridate::month(TIME_LOCAL_SOLAR), 
         MONTH_NAME = lubridate::month(TIME_LOCAL_SOLAR, label = TRUE, abbr = FALSE),
         MONTH_ABBR = lubridate::month(TIME_LOCAL_SOLAR, label = TRUE, abbr = TRUE),
         DAY = lubridate::day(TIME_LOCAL_SOLAR),
         DAY_NAME = lubridate::wday(TIME_LOCAL_SOLAR, label = TRUE, abbr = FALSE),
         DAY_ABBR = lubridate::wday(TIME_LOCAL_SOLAR, label = TRUE, abbr = TRUE),
         WEEKDAY = lubridate::wday(TIME_LOCAL_SOLAR),
         HOUR = lubridate::hour(TIME_LOCAL_SOLAR) # here!
  ) %>%
  glimpse()
## Rows: 870,424
## Columns: 11
## $ TIME_LOCAL_SOLAR <dttm> 2011-01-01 00:06:00, 2011-01-01 00:12:00, 2011-01-01…
## $ ATMP             <dbl> 9.2, 9.1, 9.0, 8.9, 8.8, 8.8, 8.8, 8.7, 8.8, 8.8, 8.7…
## $ YEAR             <dbl> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011,…
## $ MONTH            <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ MONTH_NAME       <ord> January, January, January, January, January, January,…
## $ MONTH_ABBR       <ord> Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan…
## $ DAY              <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ DAY_NAME         <ord> Saturday, Saturday, Saturday, Saturday, Saturday, Sat…
## $ DAY_ABBR         <ord> Sat, Sat, Sat, Sat, Sat, Sat, Sat, Sat, Sat, Sat, Sat…
## $ WEEKDAY          <dbl> 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,…
## $ HOUR             <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
tmp_df_1H <- temperature_df_solar %>%
  filter(TIME_LOCAL_SOLAR > lubridate::ymd("2011-01-01")) %>%
  mutate(YEAR = lubridate::year(TIME_LOCAL_SOLAR),
         MONTH = lubridate::month(TIME_LOCAL_SOLAR), 
         MONTH_NAME = lubridate::month(TIME_LOCAL_SOLAR, label = TRUE, abbr = FALSE),
         MONTH_ABBR = lubridate::month(TIME_LOCAL_SOLAR, label = TRUE, abbr = TRUE),
         DAY = lubridate::day(TIME_LOCAL_SOLAR),
         DAY_NAME = lubridate::wday(TIME_LOCAL_SOLAR, label = TRUE, abbr = FALSE),
         DAY_ABBR = lubridate::wday(TIME_LOCAL_SOLAR, label = TRUE, abbr = TRUE),
         WEEKDAY = lubridate::wday(TIME_LOCAL_SOLAR),
         HOUR = lubridate::hour(TIME_LOCAL_SOLAR)
  ) %>%
  group_by(YEAR, MONTH, DAY, HOUR) %>%
  summarise(TIME_HR = lubridate::ceiling_date(x = mean(TIME_LOCAL_SOLAR, na.rm=TRUE), unit = "hour"),
            ATMP_AVG = mean(ATMP, na.rm = TRUE),
            NR_OBS = n()
            ) %>%
  ungroup() %>%
  print(n=10)
## `summarise()` has grouped output by 'YEAR', 'MONTH', 'DAY'. You can override
## using the `.groups` argument.
## # A tibble: 87,367 × 7
##     YEAR MONTH   DAY  HOUR TIME_HR             ATMP_AVG NR_OBS
##    <dbl> <dbl> <int> <int> <dttm>                 <dbl>  <int>
##  1  2011     1     1     0 2011-01-01 01:00:00     8.9       9
##  2  2011     1     1     1 2011-01-01 02:00:00     8.66     10
##  3  2011     1     1     2 2011-01-01 03:00:00     8.44     10
##  4  2011     1     1     3 2011-01-01 04:00:00     8.51     10
##  5  2011     1     1     4 2011-01-01 05:00:00     7.75     10
##  6  2011     1     1     5 2011-01-01 06:00:00     6.98     10
##  7  2011     1     1     6 2011-01-01 07:00:00     6.72     10
##  8  2011     1     1     7 2011-01-01 08:00:00     6.39     10
##  9  2011     1     1     8 2011-01-01 09:00:00     6.39     10
## 10  2011     1     1     9 2011-01-01 10:00:00     6.62     10
## # … with 87,357 more rows

Maybe you can’t appreciate now how beauty and “simple” this is…

tmp_df_1H %>%
  filter(TIME_HR >= "2018-01-01",
         TIME_HR <= "2018-01-31") %>%
  ggplot(data = ., mapping = aes(x = TIME_HR, y = ATMP_AVG)) +
  geom_line() +
  geom_point()

Linear model plot MONTH

library(patchwork)
tmp_df %>%
  group_by(YEAR, MONTH_ABBR) %>%
  summarise(TEMPERATURE = median(ATMP, na.rm = TRUE)) %>%
  filter(MONTH_ABBR == "Jan") %>%
  ungroup() %>%
  glimpse() %>%
  ggplot(data = ., aes(x = YEAR, y = TEMPERATURE)) +
  geom_smooth(method = "lm") +
  geom_point() +
  ggpmisc::stat_poly_eq(formula = y ~ x,
                        parse = TRUE, label.x = "left", label.y = "top",
                        aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~"))) +
  plot_annotation(title = paste0("Temperature trend ", "January" ),
                  #subtitle = "Number of thermostat events in California homes during a wildfire episode; days with more events have darker shading",
                  theme = theme(plot.title = element_text(size = 14, colour = "grey20", face = "bold", hjust = 0),
                                plot.subtitle = element_text(size = 10, colour = "grey20", face = "italic", hjust = 0, margin = margin(b = 10)),
                                plot.background = element_rect(fill = "white"))) +
  ylab("Temperature (°C)") +
  xlab(element_blank())
## `summarise()` has grouped output by 'YEAR'. You can override using the `.groups`
## argument.
## Rows: 10
## Columns: 3
## $ YEAR        <dbl> 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020
## $ MONTH_ABBR  <ord> Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan
## $ TEMPERATURE <dbl> 10.10, 10.50, 9.70, 11.80, 11.90, 11.50, 9.70, 11.55, 12.0…
## `geom_smooth()` using formula 'y ~ x'

Windrose plot

Some plot for the wind direction. First let’s create the data-set.

wind_df <- meteo_df %>%
  select(TIME, WDIR, WSPD) %>%
  glimpse() %>%
  mutate(TIME = lubridate::ymd_hms(TIME, tz = "UTC") ) %>%
  glimpse()
## Rows: 870,504
## Columns: 3
## $ TIME <chr> "2011-01-01 00:00:00", "2011-01-01 00:06:00", "2011-01-01 00:12:0…
## $ WDIR <dbl> 3, 3, 1, 8, 8, 12, 11, 19, 8, 13, 7, 358, 358, 14, 7, 8, 12, 12, …
## $ WSPD <dbl> 4.3, 4.3, 4.6, 4.4, 4.4, 3.9, 3.2, 3.4, 3.1, 3.5, 3.9, 4.1, 4.1, …
## Rows: 870,504
## Columns: 3
## $ TIME <dttm> 2011-01-01 00:00:00, 2011-01-01 00:06:00, 2011-01-01 00:12:00, 2…
## $ WDIR <dbl> 3, 3, 1, 8, 8, 12, 11, 19, 8, 13, 7, 358, 358, 14, 7, 8, 12, 12, …
## $ WSPD <dbl> 4.3, 4.3, 4.6, 4.4, 4.4, 3.9, 3.2, 3.4, 3.1, 3.5, 3.9, 4.1, 4.1, …

Arrows

wind_df %>%
  filter(TIME >= "2020-01-01",
         TIME < "2020-01-02") %>%
  mutate(YEAR = lubridate::year(TIME),
         MONTH = lubridate::month(TIME), 
         MONTH_NAME = lubridate::month(TIME, label = TRUE, abbr = FALSE),
         MONTH_ABBR = lubridate::month(TIME, label = TRUE, abbr = TRUE),
         DAY = lubridate::day(TIME),
         DAY_NAME = lubridate::wday(TIME, label = TRUE, abbr = FALSE),
         DAY_ABBR = lubridate::wday(TIME, label = TRUE, abbr = TRUE),
         WEEKDAY = lubridate::wday(TIME),
         HOUR = lubridate::hour(TIME)
         ) %>%
  glimpse() %>%
  group_by(HOUR) %>%
  summarise(TIME = round_date(mean(TIME, na.rm=TRUE), unit = "hour"),
            WSPD = mean(WSPD, na.rm = TRUE),
            WDIR = (360+(atan2(sum(sin(WDIR*pi/180)),sum(cos(WDIR*pi/180)))*180/pi))%%360
  ) %>%
  ungroup() %>%
  mutate(WIND_SECTOR = case_when( WDIR > 0 & WDIR <= 22.5 ~ "North",
                                  WDIR > 22.5 & WDIR <= 67.5 ~ "North-East",
                                  WDIR > 67.5 & WDIR <= 112.5 ~ "East",
                                  WDIR > 112.5 & WDIR <= 157.5 ~ "South-East",
                                  WDIR > 157.5 & WDIR <= 202.5 ~ "South",
                                  WDIR > 202.5 & WDIR <= 247.5 ~ "South-West",
                                  WDIR > 247.5 & WDIR <= 292.5 ~ "West",
                                  WDIR > 292.5 & WDIR <= 337.5 ~ "North-West",
                                  WDIR > 337.5 & WDIR <= 360 ~ "North")
         ) %>%
  glimpse() %>%
  ggplot(., mapping = aes(x = TIME, y = WSPD)) +
  geom_segment(aes(x = HOUR,
  #geom_segment(aes(x = lubridate::hour(TIME),
  #geom_segment(aes(x = TIME,
                   y = 0,
                   # xend = HOUR - lubridate::dhours(1 * -cos((90-WDIR) / 360 * 2 * pi)),         # NORMALIZE TO 1 
                   # yend = -1 * (1 * -sin((90-WDIR) / 360 * 2 * pi))#, col = factor(mean_WS_ms)  # NORMALIZE TO 1 
                   xend = HOUR - lubridate::dhours(WSPD * -cos((90-WDIR) / 360 * 2 * pi)),        # ADDING LENGTH AS SPEED 
                   yend = - 1 * (WSPD * -sin((90-WDIR) / 360 * 2 * pi))                             # ADDING LENGTH AS SPEED
  ),
  arrow = arrow(length = unit(0.2, "cm"), type = "closed"),
  colour = RColorBrewer::brewer.pal(name = "BuPu", n = 9)[7]) +
  geom_point(aes(HOUR, 0), size = 1, colour = RColorBrewer::brewer.pal(name = "BuPu", n = 9)[7]) +
  coord_fixed(3600) +
  theme(legend.position = "none") +
  ylab("WD") +
  theme_minimal(base_size = 20) + 
  theme(axis.text.x = element_text(angle = 20), axis.title.x = element_blank(), 
        axis.text.y = element_text(colour = "white"), 
        panel.grid.major.y = element_blank(), 
        panel.grid.minor.y = element_blank(),
        plot.background = element_rect(fill = "transparent", colour = NA))
## Rows: 240
## Columns: 12
## $ TIME       <dttm> 2020-01-01 08:00:00, 2020-01-01 08:06:00, 2020-01-01 08:12…
## $ WDIR       <dbl> 269, 315, 246, 172, 103, 105, 267, 41, 74, 73, 86, 118, 200…
## $ WSPD       <dbl> 2.5, 1.1, 1.9, 0.9, 1.2, 1.5, 1.3, 2.4, 0.8, 0.7, 0.7, 0.9,…
## $ YEAR       <dbl> 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020,…
## $ MONTH      <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ MONTH_NAME <ord> January, January, January, January, January, January, Janua…
## $ MONTH_ABBR <ord> Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan,…
## $ DAY        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ DAY_NAME   <ord> Wednesday, Wednesday, Wednesday, Wednesday, Wednesday, Wedn…
## $ DAY_ABBR   <ord> Wed, Wed, Wed, Wed, Wed, Wed, Wed, Wed, Wed, Wed, Wed, Wed,…
## $ WEEKDAY    <dbl> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,…
## $ HOUR       <int> 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,…
## Rows: 24
## Columns: 5
## $ HOUR        <int> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, …
## $ TIME        <dttm> 2020-01-02 00:00:00, 2020-01-02 01:00:00, 2020-01-02 02:0…
## $ WSPD        <dbl> 3.97, 3.33, 2.67, 2.67, 4.46, 3.78, 3.26, 1.96, 1.43, 1.31…
## $ WDIR        <dbl> 238.48146, 242.40312, 237.22234, 242.44388, 245.29992, 244…
## $ WIND_SECTOR <chr> "South-West", "South-West", "South-West", "South-West", "S…

Using an external libraries

library(clifro)
library(openair)
## Warning: package 'openair' was built under R version 4.0.5

Prepare the dataframe:

wind_df_WR <- wind_df %>%
  filter(TIME >= "2020-01-01",
         TIME < "2020-01-02") %>%
  mutate(YEAR = lubridate::year(TIME),
         MONTH = lubridate::month(TIME), 
         MONTH_NAME = lubridate::month(TIME, label = TRUE, abbr = FALSE),
         MONTH_ABBR = lubridate::month(TIME, label = TRUE, abbr = TRUE),
         DAY = lubridate::day(TIME),
         DAY_NAME = lubridate::wday(TIME, label = TRUE, abbr = FALSE),
         DAY_ABBR = lubridate::wday(TIME, label = TRUE, abbr = TRUE),
         WEEKDAY = lubridate::wday(TIME),
         HOUR = lubridate::hour(TIME)
  ) %>%
  glimpse() %>%
  group_by(HOUR) %>%
  summarise(TIME = round_date(mean(TIME, na.rm=TRUE), unit = "hour"),
            WSPD = mean(WSPD, na.rm = TRUE),
            WDIR = (360+(atan2(sum(sin(WDIR*pi/180)),sum(cos(WDIR*pi/180)))*180/pi))%%360
  ) %>%
  ungroup() %>%
  mutate(WIND_SECTOR = case_when( WDIR > 0 & WDIR <= 22.5 ~ "North",
                                  WDIR > 22.5 & WDIR <= 67.5 ~ "North-East",
                                  WDIR > 67.5 & WDIR <= 112.5 ~ "East",
                                  WDIR > 112.5 & WDIR <= 157.5 ~ "South-East",
                                  WDIR > 157.5 & WDIR <= 202.5 ~ "South",
                                  WDIR > 202.5 & WDIR <= 247.5 ~ "South-West",
                                  WDIR > 247.5 & WDIR <= 292.5 ~ "West",
                                  WDIR > 292.5 & WDIR <= 337.5 ~ "North-West",
                                  WDIR > 337.5 & WDIR <= 360 ~ "North")
  ) %>%
  glimpse()
## Rows: 240
## Columns: 12
## $ TIME       <dttm> 2020-01-01 08:00:00, 2020-01-01 08:06:00, 2020-01-01 08:12…
## $ WDIR       <dbl> 269, 315, 246, 172, 103, 105, 267, 41, 74, 73, 86, 118, 200…
## $ WSPD       <dbl> 2.5, 1.1, 1.9, 0.9, 1.2, 1.5, 1.3, 2.4, 0.8, 0.7, 0.7, 0.9,…
## $ YEAR       <dbl> 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020,…
## $ MONTH      <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ MONTH_NAME <ord> January, January, January, January, January, January, Janua…
## $ MONTH_ABBR <ord> Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan,…
## $ DAY        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ DAY_NAME   <ord> Wednesday, Wednesday, Wednesday, Wednesday, Wednesday, Wedn…
## $ DAY_ABBR   <ord> Wed, Wed, Wed, Wed, Wed, Wed, Wed, Wed, Wed, Wed, Wed, Wed,…
## $ WEEKDAY    <dbl> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,…
## $ HOUR       <int> 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,…
## Rows: 24
## Columns: 5
## $ HOUR        <int> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, …
## $ TIME        <dttm> 2020-01-02 00:00:00, 2020-01-02 01:00:00, 2020-01-02 02:0…
## $ WSPD        <dbl> 3.97, 3.33, 2.67, 2.67, 4.46, 3.78, 3.26, 1.96, 1.43, 1.31…
## $ WDIR        <dbl> 238.48146, 242.40312, 237.22234, 242.44388, 245.29992, 244…
## $ WIND_SECTOR <chr> "South-West", "South-West", "South-West", "South-West", "S…

Plot:

with(wind_df_WR, windrose(speed = WSPD, 
                          direction = WDIR, 
                          speed_cuts = c(2,4,6,8), 
                          ggtheme = "bw"
                          )
     )

Multiplots (facultative)

library(patchwork) # at the end for extra time

p1 <- tmp_df %>%
  filter(TIME_LOCAL_SOLAR >= lubridate::ymd("2011-01-01")) %>%
  group_by(YEAR) %>%
  summarise(
    AVG_TEMP = mean(ATMP, na.rm=TRUE),
    SD_TEMP = sd(ATMP, na.rm=TRUE),
    SD_TEMP_PERC = SD_TEMP/AVG_TEMP*100,
    T_MAX = max(ATMP, na.rm = TRUE),
    T_MIN = min(ATMP, na.rm = TRUE),
    nr = n()
  ) %>%
  ggplot(data = .) +
  geom_point(aes(x = YEAR, y = T_MAX), color = "#051923", bg = "#051923", pch = 24) + #ADD 1 
  geom_point(aes(x = YEAR, y = T_MIN), color = "#051923", bg = "#051923", pch = 25) + #ADD 1 
  geom_segment(aes(x = YEAR, xend = YEAR, y = T_MIN, yend = T_MAX), color = "#00A6FB") +
  geom_line(aes(x = YEAR, y = (AVG_TEMP + SD_TEMP )), color = "#006494") +
  geom_line(aes(x = YEAR, y = (AVG_TEMP - SD_TEMP )), color = "#006494") +
  geom_line(aes(x = YEAR, y = AVG_TEMP), color="#006494") +
  geom_point(aes(x = YEAR, y = AVG_TEMP), size=3, color="#003554") +
  #geom_line() + #this cover the points! better before
  scale_x_continuous(breaks = seq(2011,2020,1), limits = c(2011,2020)) +
  xlab(element_blank()) + #https://ggplot2.tidyverse.org/reference/theme.html
  ylab("Atmospheric Temperature (°C)") +
  theme_classic() +
  theme(panel.grid.major = element_line(colour = "#CCCCCC"))

p2 <- tmp_df %>%
  filter(TIME_LOCAL_SOLAR >= lubridate::ymd("2011-01-01")) %>%
  ggplot(.) +
  geom_boxplot(aes(x = YEAR, y = ATMP, group = YEAR)) +
  scale_x_continuous(breaks = seq(2011,2020,1), limits = c(2010.5,2020.5)) +
  xlab(element_blank()) + #https://ggplot2.tidyverse.org/reference/theme.html
  ylab("Atmospheric Temperature (°C)") +
  theme_classic() +
  theme(panel.grid.major = element_line(colour = "#CCCCCC"))

p1 / p2
## Warning: Removed 34964 rows containing non-finite values (stat_boxplot).

#p1 + p2
#p1 | p2

Some further readings